KLEE: A Framework for Distributed Top-k Query Algorithms
نویسندگان
چکیده
This paper addresses the efficient processing of top-k queries in wide-area distributed data repositories where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers and the computational costs include network latency, bandwidth consumption, and local peer work. We present KLEE, a novel algorithmic framework for distributed top-k queries, designed for high performance and flexibility. KLEE makes a strong case for approximate top-k algorithms over widely distributed data sources. It shows how great gains in efficiency can be enjoyed at low result-quality penalties. Further, KLEE affords the query-initiating peer the flexibility to trade-off result quality and expected performance and to trade-off the number of communication phases engaged during query execution versus network bandwidth performance. We have implemented KLEE and related algorithms and conducted a comprehensive performance evaluation. Our evaluation employed real-world and synthetic large, web-data collections, and query benchmarks. Our experimental results show that KLEE can achieve major performance gains in terms of network bandwidth, query response times, and much lighter peer loads, all with small errors in result precision and other result-quality measures.
منابع مشابه
Unified Framework for Top-k Query Processing in Peer-to-Peer Networks
Supporting queries over dispersed data stored in large-scale distributed systems, such as peer-to-peer networks, naturally calls for ranked retrieval in order to effectively focus on the most relevant (i.e., top-k) results. While top-k retrieval has been actively studied lately, existing algorithms are too restrictive due to their assumptions about how the data is partitioned amongst the variou...
متن کاملSpatio-Temporal Query Processing in Smartphone Networks
In this position paper, we present a powerful and distributed spatio-temporal query processing framework, coined HUB-K. Our framework can be utilized to promptly answer queries of the form: “Report the objects (i.e., trajectories) that follow a similar spatio-temporal motion to Q, where Q is some query trajectory.” HUB-k, relies on an in-situ data storage model, where spatio-temporal data remai...
متن کاملTop-k aggregation queries in large-scale distributed systems
Distributed top-k query processing has become an essential functionality in a large number of emerging application classes like Internet traffic monitoring and Peer-to-Peer Web search. This work addresses efficient algorithms for distributed topk queries in wide-area networks where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers.
متن کاملAnswering Why-not Questions on Reverse Top-k Queries
Why-not questions, which aim to seek clarifications on the missing tuples for query results, have recently received considerable attention from the database community. In this paper, we systematically explore why-not questions on reverse top-k queries, owing to its importance in multi-criteria decision making. Given an initial reverse top-k query and a missing/why-not weighting vector set Wm th...
متن کاملSearch for the Best but Expect the Worst - Distributed Top-k Queries over Decreasing Aggregated Scores
We consider distributed top-k queries in wide-area networks where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers. In contrast to existing work, we exclusively consider distributed top-k queries over decreasing aggregated values. State-of-the-art distributed top-k algorithms usually depend on threshold propagation to reduce expen...
متن کامل